54 research outputs found

    A bayesian classifier for the recognition of the impersonal occurrences of the 'it' pronoun

    No full text
    International audienceThis paper presents a new system that makes the distinction between the impersonal and anaphoric occurrences of the 'it' pronoun. Compared with the state of the art methods, our system relies on the same types of linguistic knowledge but performs better. We argue that this is due to the bayesian model on which it is based: it enables to combine various pieces of knowledge and to exploit even unreliable ones in the process of pronoun occurrence classification

    Identifier les pronoms anaphoriques et trouver leurs antécédents : l'intérêt de la classification bayésienne

    No full text
    National audienceIn NLP, a traditional distinction opposes linguistically-based systems and knowledge-poor ones, which mainly rely on surface clues. Each approach has its drawbacks and its advantages. In this paper, we propose a new approach based on Bayes Networks that allows to combine both types of information. As a case study, we focus on the anaphora resolution which is known as a difficult NLP problem. We show that our bayesain system performs better than a state-of-the-art one for this task

    A Bayesian approach combining surface clues and linguistic knowledge: Application to the anaphora resolution problem

    No full text
    International audienceIn NLP, A traditional distinction opposes the linguistically-based systems and the knowledge-poor ones which mainly rely on surface clues. Each approach has its drawbacks and its advantages. In this paper, we propose a new method which is based on Bayes Networks and allows to combine both types of information. As a case study, we focus on the specific task of pronominal anaphora resolution which is known as a difficult NLP problem. We show that our bayesian system performs better than state-of-the art anaphora resolution ones

    The ALVIS Format for Linguistically Annotated Documents

    Full text link
    The paper describes the ALVIS annotation format designed for the indexing of large collections of documents in topic-specific search engines. This paper is exemplified on the biological domain and on MedLine abstracts, as developing a specialized search engine for biologists is one of the ALVIS case studies. The ALVIS principle for linguistic annotations is based on existing works and standard propositions. We made the choice of stand-off annotations rather than inserted mark-up. Annotations are encoded as XML elements which form the linguistic subsection of the document record

    Which Factors Contributes to Resolving Coreference Chains with Bayesian Networks?

    Get PDF
    International audienceThis paper describes coreference chain resolution with Bayesian Networks. Several factors in the resolution of coreference chains may greatly affect the final performance. If the choice of machine learning algorithm and the features the learner relies on are largely addressed by the community, others factors implicated in the resolution, such as noisy features, anaphoricity resolution or the search windows, have been less studied, and their importance remains unclear. In this article, we describe a mention-pair resolver using Bayesian Networks, targeting coreference resolution in discharge summaries. We present a study of the contributions of comprehensive factors involved in the resolution using the 2011 i2b2/VA challenge data set. The results of our study indicate that, besides the use of noisy features for the resolution, anaphoricity resolution has the biggest effect on the coreference chain resolution performance

    Patient-Reported Reasons for Switching or Discontinuing Statin Therapy : A Mixed Methods Study Using Social Media

    Get PDF
    INTRODUCTION: Statin discontinuation can have major negative health consequences. Studying the reasons for discontinuation can be challenging as traditional data collection methods have limitations. We propose an alternative approach using social media. METHODS: We used natural language processing and machine learning to extract mentions of discontinuation of statin therapy from an online health forum, WebMD ( http://www.webmd.com ). We then extracted data according to themes and identified key attributes of the people posting for themselves. RESULTS: We identified 2121 statin reviews that contained information on discontinuing at least one named statin. Sixty percent of people posting declared themselves as female and the most common age category was 55-64 years. Over half the people taking statins did so for < 6 months. By far the most common reason given (90%) was patient experience of adverse events, the most common of which were musculoskeletal and connective tissue disorders. The rank order of adverse events reported in WebMD was largely consistent with those reported to regulatory agencies in the US and UK. Data were available on age, sex, duration of statin use, and, in some instances, adverse event resolution and rechallenge. In some instances, details were presented on resolution of the adverse event and rechallenge. CONCLUSION: Social media may provide data on the reasons for switching or discontinuation of a medication, as well as unique patient perspectives that may influence continuation of a medication. This information source may provide unique data for novel interventions to reduce medication discontinuation

    A Chronological and Regional Analysis of Personal Reports of COVID-19 on Twitter from the UK

    Get PDF
    OBJECTIVE: Given the uncertainty about the trends and extent of the rapidly evolving COVID-19 outbreak, and the lack of extensive testing in the United Kingdom, our understanding of COVID-19 transmission is limited. We proposed to use Twitter to identify personal reports of COVID-19 to assess whether this data can help inform as a source of data to help us understand and model the transmission and trajectory of COVID-19. METHODS: We used natural language processing and machine learning framework. We collected tweets (excluding retweets) from the Twitter Streaming API that indicate that the user or a member of the user's household had been exposed to COVID-19. The tweets were required to be geo-tagged or have profile location metadata in the UK. RESULTS: We identified a high level of agreement between personal reports from Twitter and lab-confirmed cases by geographical region in the UK. Temporal analysis indicated that personal reports from Twitter appear up to 2 weeks before UK government lab-confirmed cases are recorded. CONCLUSIONS: Analysis of tweets may indicate trends in COVID-19 in the UK and provide signals of geographical locations where resources may need to be targeted or where regional policies may need to be put in place to further limit the spread of COVID-19. It may also help inform policy makers of the restrictions in lockdown that are most effective or ineffective
    corecore